Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation
نویسندگان
چکیده
Copyright c 2003 IEEE. Published in the Proceedings of the 2003 International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2003, Austin, Texas. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 908-562-3966. Abstract— This paper proposes to speedup sampled microprocessor simulations by reducing warmup times without sacrificing simulation accuracy. It exploiting the observation that of the memory references that precede a sample cluster, references that occur nearest to the cluster are more likely to be germane to the execution of the cluster itself. Hence, while modeling all cache and branch predictor interactions that precede a sample cluster would reliably establish their state, this is overkill and leads to longrunning simulations. Instead, accurately establishing simulated cache and branch predictor state can be accomplished quickly by only modeling a subset of the memory references and controlflow instructions immediately preceding a sample cluster. Our technique measures memory reference reuse latencies (MRRLs)—the number of completed instructions between consecutive references to each unique memory location—and uses these data to choose a point prior to each cluster to engage cache hierarchy and branch predictor modeling. By starting cache and branch predictor modeling late in the pre-cluster instruction stream, we were able to reduce overall simulation running times by an average of 90.62% of the maximum potential speedup (accomplished by performing no pre-cluster warmup at all), while generating an average error in IPC of less than 1%, both relative to the IPC generated by warming up all pre-cluster cache and branch predictor interactions.
منابع مشابه
BLRL: Accurate and Efficient Warmup for Sampled Processor Simulation
Current computer architecture research relies heavily on architectural simulation to obtain insight into the cycle-level behavior of modern microarchitectures. Unfortunately, such architectural simulations are extremely time-consuming. Sampling is an often-used technique to reduce the total simulation time. This is achieved by selecting a limited number of samples from a complete benchmark exec...
متن کاملMemory Reference Reuse Latency: Accelerated Sampled Microarchitecture Simulation
This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming up simulated cache and branch predictor state only requires that a subset of the memory reference...
متن کاملAccurate and Efficient Cache Warmup for Sampled Processor Simulation Through NSL-BLRL
Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation that selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at t...
متن کاملBranch History Matching: Branch Predictor Warmup for Sampled Simulation
Computer architects and designers rely heavily on simulation. The downside of simulation is that it is very time-consuming — simulating an industry-standard benchmark on today’s fastest machines and simulators takes several weeks. A practical solution to the simulation problem is sampling. Sampled simulation selects a number of sampling units out of a complete program execution and only simulat...
متن کاملEfficient Sampling Startup for Sampled Processor Simulation
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months. Statistical sampling and sample techniques like SimPoint that pick small sets of execution samples have been shown to provide accurate results while significantly reducing simulation time. The inefficiencies in sampling are (a) ne...
متن کامل